An XAI method for convolutional neural networks in self-driving cars

eXplainable Artificial Intelligence (XAI) is a new trend of machine learning. Machine learning models are used to predict or decide something, and they derive output based on a large volume of data set. Here, the problem is that it is hard to know why such prediction was derived, especially when using deep learning models. It makes the models unreliable in the case of reliability-critical applications. So, it is required to explain how they derived such output. It is a reliability-critical application for self-driving cars because the mistakes made by the computers inside them can lead to critical accidents. So, it is necessary to adopt XAI models in this field. In this paper, we propose an XAI method based on computing and explaining the difference of the output values of the neurons in the last hidden layer of convolutional neural networks. First, we input the original image and some modified images of it. Then we derive output values for each image and compare these values. Then, we introduce the Sensitivity Analysis technique to explain which parts of the original image are needed to distinguish the category. In detail, we divide the image into several parts and fill these parts with shades. First, we compute the influence value on the vector indicating the last hidden layer of the model for each of these parts. Then we draw shades whose darkness is in proportion to the influence values. The experimental results show that our approach for XAI in self-driving cars finds the parts needed to distinguish the category of these images accurately.


Introduction
eXplainable Artificial Intelligence (XAI) [1] is a field of machine learning. Its goal is to explain how the machine learning models derive outputs. When using XAI to a machine learning model, the model can be more reliable because we can track the process of inference of the model. It is crucial to adopt XAI to self-driving cars because a misunderstanding of the image recognization model adopted for them can lead to deaths. Suppose that we input a picture with the blue sky on the top and the road on the bottom. Then the model can explain that the picture shows a straight road because of the blue sky on the top, not the road on the bottom. Because we need computer vision techniques, we can use Convolutional Neural Networks (CNN) technique for object detection. In short, we need XAI methods for the convolutional neural network of the model used for self-driving cars. We can classify XAI methods into three major categories. One of them is Sensitivity Analysis (SA) [2], another is Layer-wise Relevance Propagation (LRP) [3], and the other is Feature Importance [4]. We focus on the Sensitivity Analysis method for explaining our convolutional neural network model. It is a method that estimates the influence of each input variable. First, we modify each input variable as a specific value and input the modified input vector to the model. Then we can measure how much the output vector is different from the output vector of the model when the input is the original input vector. So, we can see which elements of the input vector influence the output vector hugely, and so we can see which part of the input makes the model decided rightly or wrongly. [5] describes five terms about XAI, understandability, comprehensibility, interpretability, explainability and transparency. Applying this study, We need transparency for our model because one major goal of our study is to prevent accidents that potentially can be made by self-driving cars. Because we use the Sensitivity Analysis method that simulates the layer output of our CNN model, the category of transparency of our XAI model is simulatability. Because our model is a CNN model which is not readily interpretable, we should use post-hoc explainability [5]. Our model visualizes the influence of the change of inputs, and we run our XAI model using car-related images such as vehicle and non-vehicle images, the categories that fit the post-hoc explainability of the model are visual explanation and explanations by example. In addition, [5] describes some goals of XAI, such as trustworthiess, causality and transferability. According to this, the major goal of our XAI model is trustworthiness because our goal is to make a CNN model more trustworthy so that we can prevent potential accidents. Consequently, our XAI model gives transparency to the CNN model by using simulatability, with some example images and visualized explanations about them, to increase the trustworthiness of the CNN model.
There are some previous researches about XAI methods for CNN models. SHAP [6] uses feature importance values for each feature and computes these values by comparing images including and not including this feature. LIME [7] uses an interpretable model which learns with sampled instances from a local area. Grad-CAM [8] uses some counterfactual explanations to change the prediction of CNN, and it can explain any layer, including the last hidden layer. eXplainable CNN (XCNN) [9] uses the network with a heatmap generator of encoderdecoder architecture and creates explanations using the output of this architecture. [10] generates visual explanations using the weighted sum of the feature masks. Two metrics (insertion and deletion) are used for weight computation, using both similarity difference and uniqueness values. [11] make the DCNN (Deep convolutional neural networks) learn from relevant and irrelevant features. It also uses a denoising algorithm and gradient attribution. [12] tries to find the reasons for classification errors using multiple methods and visualizes the last convolutional layer. [13] uses LIME [7] for radar images, with a CNN including various kinds of Keras layers such as ReLU, Batch Normalization, and Flatten. [14] uses a multi-leveled Layerwise Relevance Propagation (LRP) called Deep Taylor Method for medical images. It experiments on two popular image detection models, Resnet-50 and VGG-16. [15] uses an attribution mask derived from input images, and derives layer visualization map and attribution mask scoring based on the point-wise multiplication between the image and the mask. [16] compares and evaluates LRP with some relevance maps. It compares the average relevance maps and the topo-plots for binary masks for various methods, including LRP-based methods. [17] uses saliency mapping by adding Gaussian noise to the input images. Its data set contains many kinds of galaxy images, and it uses some data augmentation techniques such as random rotations and flips. [18] discovers that natural images are more helpful for providing information about the feature map of CNN than synthetic images. [19] uses a CNN model including Grad-CAM [8] and Gated Recurrent Unit (GRU) for traffic accident anticipation. [20] evaluates how much each type of explanation provides reliable information to people for three different conditions. [21] uses SLRP (a modified version of Layer-wise Relevance Propagation) to find the propagate relevances for each layer of the deep learning models to detect the category of the things in the images, for CNN and RNN. [22] applies Class activation mapping (CAM) to CNN, which uses the weighted sum of image filters of the same size. It uses F-measure and AUC as evaluation metrics. [23] uses some XAI methods such as Grad-CAM and GuidedBP, and its data set contains many mathematical symbols and their combinations. [24] introduces an XAI software, TorchPRISM, and uses the methods such as PCA (principal component analysis) and bilinear interpolation. [25] uses the selected templates corresponding to the feature maps, and represents each category using the set of positive templates. Some methods [6,8] use the changes or differences of the input values of a neural network. (refer to the comparison table, Table 1) But our method directly uses these changes and does not include any complex formulas or algorithms. So our method is the most simple compared to these methods among the methods using these changes.
Nowadays, Many electronic things we meet in our life contain machine learning algorithms. There are some kinds of such things that can lead to a critical accident if the decision made by these algorithms are wrong. One of them is self-driving cars. The deep learning models for making decisions usually do not make "explanations" about why they made the decisions, and we do not know the exact value for each neuron from these models. So, the problem is that we cannot know 'why' the models made these decisions. In other words, we cannot know which parts of the input image made the model predict like now without XAI techniques. For example, the decision-making model of the self-driving car classified an image as a 'straight road' by using the upper part of this image with the blue sky, not using the lower part with the road. In this situation, we can think that the model accuracy is high because it classified the image as a straight road. But when the input image contains only the blue sky, the model can classify it wrongly as 'straight road' even if it does not include the road. The solution for this situation is eXplainable Artificial Intelligence (XAI).
Our approach has the four stages below: • modifying each part of the original input image • inputting each modified image to the model • deriving the output • comparing the output with the output when the input is the original image In detail, we divide the entire image into many rectangular sub-images with the same width and height. We call each divided sub-image a part, and then we make each modified image by Table 1. Method comparison of each related paper.

category of the main method papers
difference of the input values of neural network [6,8] LIME based methods [7,13] feature (or feature masks) [10,11,15,22,25] LRP-based methods [14,16,21] others [9, 12, 17-20, 23, 24]  filling each part from the original input images black (RGB 0,0,0). By doing this, We can measure the influence of the change of each part of the input images on the final output. The main contribution of our method is that we can make meaningfully accurate explanations for the result in a relatively simple way. In the Related Works section, there are so many complex methods that try to make an explanation for an image. [6,8] also generate explanations accurately, but they use Kernel SHAP and complex mathematical operations, and these are not simpler than our method.

Overview
A brief description of our methodology is in the flow chart Fig 1. First, we perform gray-scaling stage of the images and pre-training stage of the CNN model before the main XAI algorithm. Because the pre-training stage is not directly related to XAI, we will explain this later than the main XAI algorithm, in Experiments and Discussion section. Next, we go to the main XAI algorithm. Our main XAI algorithm includes four stages(steps). First, modifying image is making changes from the original image to explain the changed parts. The algorithm performs it for each part of the image. Second, finding vector is inputting the original image and modified images into the network and getting the output vectors of the last hidden layer for each image. Third, computing difference is comparing the output vectors of the last hidden layer, when inputting the original image (original output vector) and each modified image (modified output vectors), and then computing the difference between the original output vector and each modified output vector using Euclidean distance. Last, making explanations is filling each part of the copied original image in proportion to the difference of the original output vector and each modified output vector, computed in computing difference. In practice, inputting and getting the output vector for the original image from finding vector earlier than performing modifying image has no problem, and in this paper, we performed in this way. For pre-training stage, we used the CNN model described in Fig 2. We used Tensorflow [26] for model training.

Main XAI algorithm
We define each terms as following.   step making explanations). Now, we can use I C to estimate which part of the image influenced the final prediction of the model. We are using the last hidden layer output of CNN instead of the final prediction. There are some reasons for this. First, there are more parameters in the last hidden layer than the final output layer. Second, when the output value of the last hidden layer is influenced more by the difference between the input, the final prediction is also largely influenced (refer to Experimental Results). And when the number of neurons in the final output layer is small, the output of this layer can be influenced less. It means that the https://doi.org/10.1371/journal.pone.0267282.g003 change of the input values influenced the entire model largely because the output values of the last hidden layer were influenced more.
Our XAI method can be described with Figs 3 and 4 and function GenerateExplanation of Algorithm 1. In Algorithm 1, function grayscale(img) receives an image img, then performs gray-scaling to img, and returns the gray-scaled image. In addition, function predict(img, model, i) receives an input image img, a convolutional neural network model model, and the layer index i = 0, . . ., layers − 1 where layers is the number of layers in model, including the input layer whose index i is 0 and output layer whose index i is layers − 1. For the layer index i, when the output of a layer whose index is i out and the output is used for the input of the next layer whose index is i in , it is always true that i out < i in . function Fill(img, y 0 , y 1 , x 0 , x 1 , opacity, explan) receives original image img and integer values y 0 , y 1 , x 0 , x 1 , then copies img and fills the square area of copied image with black (RGB 0,0,0) (when explan = False) or purple (RGB 153,0,255) (when explan = True) with opacity opacity whose range is 0.0 to 1.0 (when explan = False) or 0.75 (when explan = True), where the horizontal and vertical range of the area are from x 0 to x 1 (x 0 < x 1 ) and from y 0 to y 1 (y 0 < y 1 ), respectively.

Experiments and discussion
The name of the steps of our method (pre-training, modifying image, finding vector, computing difference and making explanations) can be referred in this section including subsections, and these names are from section Overview.
We used the test images and pre-trained CNN model of the step Pre-training for the experiment. It means we input the test images into the pre-trained CNN and got the result. We set W = 64, H = 64, M = 8 and N = 8 for our experiment, and so the value of w and h is both 8. The programming language we used is Python 3.7, and Operating System is Windows 10

Pre-training
We pre-trained the CNN with four image datasets downloaded from [27, 28]. Table 2 describes detailed dataset information. For example, The dataset named "Vehicle vs. Non-vehicle" [27] contains 7,325 images in total (3,900 non-vehicle images and 3,425 vehicle images). We split these images into training images and test images, where the proportion of images for training and test is 0.8 and 0.2, respectively. So we have 3,120 training images and 780 test images for the non-vehicle category. Also, we have 2,740 training images and 685 test images for the vehicle category.
For each dataset, we trained the CNN model with all the images for training and then evaluated with Mean Square Error (MSE) and accuracy. (stage pre-training) The final output contains N elements which indicate each category, where N is the number of categories in the dataset. We measured the accuracy using (correct count) / (total number of images). correct count is defined as the number of images whose index of the largest element in the final output vector for the prediction is the same as for the ground truth.
The dataset "traffic signs with few classes" used the traffic signs dataset [28], but we reduced the number of classes. We marked each image as below. "A: B" means that we marked the image as A if whose original class is one of B. The dataset "traffic signs [28] (only speed limit signs)" also used the traffic signs dataset [28], but we only used 'speed limit' sign images whose original classes are 0, 1, 2, 3, 4, 5, 7, and 8. We marked each image as class 0, 1, 2, 3, 4, 5, 6, and 7 to the images whose original class is 0, 1, 2, 3, 4, 5, 7, and 8, respectively.

Experiment for XAI algorithm
We created the image explaining our experimental result using matplotlib [29].   Table 3 describes the correlation coefficients between each pair of two variables we think meaningful. The description of each variable is as Table 4. So, the result in the left image of (B) says our XAI model can bring meaningful results. For the vehicle image (right image of (A) and (B)), we can see that the middle and lower parts of the image influence more than the other parts, because the model we designed usually filled upper parts as less clear purple and lower parts as more clear purple. Specifically, the background parts in the top and left of the image influenced much less than the middle and lower parts. Unlike the non-vehicle images, the center part largely influenced the final prediction and the last hidden layer output for the vehicle images. The reason is that we usually distinguish vehicles from non-vehicles using the shape and color of their body. That is, we can recognize it as an object other than a vehicle if the shape and color do not match the image of vehicles.
From Fig 6, we can see that for most of the images, the correlation between the average changes of the final output vector (O_dif) and the last hidden layer output vector (dif n,m ) is positive enough. There is no image that they have a negative correlation. Among the four datasets we used, the maximum value of the correlation coefficient is between 0.95 and 1.0, from "Vehicle vs. Non-vehicle" and "traffic signs (only speed limit signs)". The minimum value is between 0.3 and 0.35, from "traffic signs with few classes". The dataset with the highest lh_o_corr is "traffic signs" (0.7668), and the lowest is "traffic signs with few classes" (0.5901).
From Table 3, we can see that there are positive correlations among difCells, avgRank, max-Rank, smax − dmin and O dif /dif n,m (group 1), and between difMin and max − 2nd (group 2), and negative ones among the two groups. is larger, the final prediction can change easier, it indicates larger difCells. We define the two cases of images here: Case 1 is the images whose final prediction of CNN can be changed by small changes on it, and Case 2 is the opposite. Because smax − dmin is large when samMax is large and difMin is small, and the value of difMin for Case 2 is larger than Case 1, and the value of Rank is higher than Case 2 than Case 1, maxRank and smax − dmin have positive correlations, and they have negative ones with difMin. Because for Case 1 images with larger dif-Min values, the two elements of the final prediction vector of them have larger difference, and it indicates they have larger max − 2nd values, difMin and max − 2nd have positive correlation. lh_o_corr have a weak positive correlation with the variables from group 1, because the images   . The appearance of the correlation coefficients between every two variables, as mentioned above, has consistency among all the datasets used for this experiment. Fig 7 describes the comparison of our method with SHAP [6], LIME [7], Grad-CAM [8] and eXplainable CNN (XCNN) [9]. The methodologies used in these papers are described in Related Works section. We implemented and executed each algorithm with Python, and the codes for them include some code snippets from [30-33], respectively. One can see our code for this comparison in [34]. The core point of our model, which any of these methods to compare don't have, is that one can see which part of the image influences the final prediction and the output of LHL directly, with the fill color for each part of the image. The highlighted parts in the resulting images of SHAP, LIME, Grad-CAM, XCNN, and our method are all different for most cases. Another core point of our model is that except for the cases where any modification cannot influence the final output prediction, our model does not fail for any image, likely SHAP and unlikely LIME and Grad-CAM. We can make the final explanation visually better by applying other colors to fill for the explanation made by the model. Also, by increasing the value of M and N, we can describe more detailed explanations.
SHAP, image mask of LIME, Grad-CAM, and our method show the original images with explanations so that one can find which part of the image influenced the final prediction easily. Among these methods, SHAP explains in detail and shows the parts which have positive influences increasing the probability for a class, and negative influences decreasing it, with different colors. But it fails for some images such as 6th and 7th left images whose SHAP results are 'completely blue'. LIME seems to be failed for both image mask and heatmap because the size of the original image is quite small (64x64). Because LIME uses segmentation for pixels, it is not good for small-size images, so the result is quite bad. In addition, there are some images (2nd and 8th left image) that LIME failed to make explanations. Grad-CAM highlights the area more smoothly than other methods and seems successful, but in the 3rd left image, it shows a failure. In this image, it highlights not the circular sign with '30', but the triangle sign above it. XCNN seems to just show the 'texture' of the images, and it is not enough explanation for the images.
For the images whose Dataset No. is 4 (dataset name: traffic signs (only speed limit signs), the rightmost three images), the classification result is decided by the number(s) on the left of the rightmost '0'. That is because it is decided by the number inside the sign, and the rightmost number is always '0'. So, these number(s) are the most important things to decide the class of images, and XAI methods should highlight these number(s). For example, if the image contains the speed limit sign with '80', XAI methods should highlight '8' on the left of '0'. Considering these images, only the two methods (SHAP and our method) are successful. Grad-CAM is successful for only one image (the 3rd right image) among these three images. Consequently, because SHAP fails for some images as mentioned above, our method shows meaningful XAI performance for car-related things images.

Conclusion
We created an XAI model to see how the CNN model predicts the class of image and which part of the image influences the final prediction of the CNN model more than other parts. Also, we designed the model to explain the influence of each part on the final prediction. Also, we defined, measured, and analyzed the variables about the details of the XAI model. As from Describes the comparison of our method with some other methods. They include SHAP [6], LIME [7], Grad-CAM [8] and eXplainable CNN (XCNN) [9]. "Dataset No." is 1, 2, 3, and 4 for "Vehicle vs. Non-vehicle", "traffic signs", "traffic signs with few classes" and "traffic signs (only speed limit signs)", respectively, "Class No." is the ground truth of the class of each image, and "Image No." is the index of each image among all the images-for-test from the dataset which includes this image. https://doi.org/10.1371/journal.pone.0267282.g007 Discussion section, the XAI model works well for vehicle images, non-vehicle images, and other car-related datasets.
Our research can contribute to the CNNs used for self-driving cars by providing a simple and intuitive XAI system. The main contribution is that we proposed a simplified method for the XAI process for these two main contributions below, for self-driving cars, and it is also the main difference between our method and the previous methods.
• First, our research can detect and help analyze the prediction pattern of the CNN model so it can measure and ensure the credibility of the CNN model. So we can help to improve the CNN model. For example, we can run our model with images with both an object and the background. In this experiment, if the influence of the part corresponding to the object is much more than the part corresponding to the background, we can see that the CNN model is credible.
• Second, when the self-driving car causes an accident, we can analyze how and why the CNN model failed to predict correctly. For example, suppose that the model predicted an object in an image as the "non-vehicle" class, but it is a vehicle. When this situation caused the accident, we can see it from the detailed result of running the image into the XAI model.